Lab 3: Circuit Analysis in a Convolutional Neural Network

Team: Mike Wisniewski, Henry Lambson, Alex Gregory

Utility Functions

Nearly all functions are taken from lecture with small quality of life changes - but no serious alterations to code logic. We added the "grad_cam_single_filter" function to mimic the class provided "grad_cam" function, but at a filter level (instead of the entire model).

Model Selection

[3 Points] In groups, you should select a convolutional neural network model that has been pre-trained on a large dataset (preferably, ImageNet). These already trained models are readily available online through many mechanisms, including the keras.application package (Inception, Xception, VGG etc.) https://keras.io/api/applications/

  • It is recommended to select a model with somewhat simple structure, like VGG. This can help to simplify how to extract specific filters and inputs to filters later on.
  • Explain the model you chose and why. Classify a few images with pre-trained network to verify that it is working properly.

Our group elected to use VGG19 as our pretrained model to investigate. We felt that VGG19 would be a good model to look at because of its high layer count. Since we want to investigate filters in the middle of a model, we felt that a model with more layers would be a good example. VGG19 is strong at classifying simple objects, so it would be easy to find images for us to test on the filters. VGG19 is included in OpenAI's microscope, so we are able to easily verify if the patterns we create for the filters are accurate. We chose to look into the "block3_conv3" layer because it is right in the middle of the model. From this layer, we chose a random filter to investigate, filter 12.

As shown in the example images passed through the model, VGG19 is able to accurately classify each of the pictures with confidences ranging from aproximately 82% to almost 100%. The model is clearly functioning correctly. To get an idea of what the model is looking for when it classifies an image, we plotted what the model was activated most by in each image in heatmaps. Below is the code for this analysis and we will elaborate on each image as they come up.

Test images used are from Imagenet and were found here: https://www.kaggle.com/datasets/lijiyu/imagenet

As evident, VGG19 is able to accurately predict all images with high probability. Although the school bus picture had the lowest probability, there is no concern for accuracy as 82% is still very good. One interesting point that we observed is the school bus picture contains other structures in the background. As a fun hypothesis, our team thinks that some filters will become more excited by background objects than by the school bus itself. Although we don't know if our specific filter will have this excitement, we won't rule out that possibility.

In terms of the overall model, excitement across class predictions are very similar for each picture. For the finch, there is strong evidence to suggest that the edge and tip of the wing is a distinguishing feature for the guess "golden finch", whereas the brambling focuses near the center of the wing and the bulbul is excited about the wing and underbelly of the bird. For the bell pepper, we were surprised by excitement on the edge of the photo - but this suggests there are features within these bell peppers that are distinguishable from other types of classes (grocery store, cucumber). Also to note, there are excited areas on the bulbs, or tops, of the bell peppers - providing evidence to support that these are features the model tries to identify. For the photocopier, we were surprised that the interface was not as excitable as we anticipated. This, along with the excitement at the tray end of the photocopier suggests that the excited features to distinguish a photocopier are the notable paper trays at the end of the photocopier. For school bus, the excited features are what we expected it to be. For the tricycle, there is evidence to support that the model looks for a specific wheel nut in order to classify a tricycle.

This preliminary analysis for our model is bedrock to our further analysis below and we will be using the above framework to explore further into VGG19.

Multi-Channel Filters

[4 Points] Select a multi-channel filter (i.e., a feature) in a layer in which to analyze as part of a circuit. This should be a multi-channel filter in a "mid-level" portion of the network (that is, there are a few convolutional layers before and after this chosen layer). You might find using OpenAI microscope a helpful tool for selecting a filter to analyze without writing too much code: https://microscope.openai.com/models/

  • Using image gradient techniques, find an input image that maximally excites this chosen multi-channel filter. General techniques are available from class: https://github.com/8000net/LectureNotesMaster/blob/master/04%20LectureVisualizingConvnets.ipynb
  • Also send images of varying class (i.e. from ImageNet) through the network and track which classes of images most excite your chosen filter.
  • Give a hypothesis for what this multi-channel filter might be extracting. That is, what do you think its function is in the network?
  • If using code from another source, you must heavily document the code so that I can grade your understanding of the code used.

In this section, we explore which images maximally excite our given filter, which set of images and classes from a set of test images maximally excite our filter and what our hypothesis is for filter extraction.

First, we generate an image that maximally excited our filter and compare to OpenAI's Microscope. We use the generate_pattern function from class. Analysis to follow

OpenAI Microscope VGG19 Unit 12 Image

image

Here is the input image that maximally excites filter 12 in block 3 convolution 3. Based on this image, we hypothesize that this filter is trained to activate when it finds diagonal lines going upward from left to right. This image shares similarities to the images classified as lines in the Zoom In article (https://distill.pub/2020/circuits/zoom-in/), which led us to the hypothesis that this filter is looking for diagonal lines. Additionally, compared to OpenAI Microscope pictured, we believe that we are calculating the input image that maximally excites this filter correctly.

We output the remaining filters of the entire layer and do some spot checks with Microscope. We believe we have done image generation correctly

Once again we are using heatmaps to show where on our test images the filter is being maximally excited. These heatmaps support our hypothsesis that the filter is activating on diagonal lines going upwards from left to right, with the excpetion of the photocopier. The best example of support for our hypothesis is the tricycle. The frame of the tricycle in this image is a diagonal line going upwards from left to right, and that is exactly what the filter is activating most on. In the school bus image, the filter is activating on the lines on the bus, as will as the outlines of the trees in the backgrounds, both of which have slight diagonals going upwards from left to right. In the bell pepper image, the filter is activating on the edges of the peppers which are going diagonally upwards from left to right. The one place where the filter is very active in the finch image is the bird's wing, which again is diganoally upwards from left to right.

The photocopier image, however, seems to activate the filter on vertical and horizontal lines. This heatmap does not support our hypothesis completely, but the filter is still activating on edge detection and lines. Our overlay of this filter onto our original images provides evidence to support our hypothesis.

[4 Points] Analyze each channel of the multi-channel filter to this feature that might form a circuit. That is, visualize the convolutional filter (one channel) between the input activations and the current activation to understand which inputs make up a circuit. One method of doing this is given below:

  • Extract the filter coefficients for each input activation to that multi-channel filter. Note: If the multi-channel filter is 5x5 with an input channel size of 64, then this extraction will result in 64 different single channel filters, each of size 5x5.
  • Keep the top six sets of inputs with the "strongest" weights. For now, you can use the L2 norm of each input filter as a measure of strength. Visualize these top six filters.
  • For these six strongest input filters, categorize each as "mostly inhibitory" or "mostly excitatory." That is, does each filter consist of mostly negative or mostly positive coefficients?

In this section, we drill down to the top 6 "strongest" sets of weights and the associated filter. In the previous exercise, we extract the outputs by filtering on the 4 index in our weights matrix. However, the structure of this matrix is (3, 3, 256, 256), where the 3rd index is the input activations. To extract the top 6 filter coefficients for the input activation, we isolate our weights matrix on the input activations. Our next step will be to take the L2 norm using numpy and sort these weights. Finally, we plot and analyze the excitatory/inhibitory filter patterns

Although each filter is extracting different features, we noticed that filter 1 seems to be extracting horizontal lines or blocks, filter 2 has a backward L pattern suggesting features that have a run over rise pattern are important. Filter 3 is mostly inhibitory to what appears to be top left to bottom right. Filter 4 appears to be inhibitory in a "C" shape. Filter 5 appears to be inhibitory to features that represent sideways "L". We don't really understand filter 6 very well. In these heatmaps, red corresponds to excitatory, and blue corresponds to inhibitory. Based on this, we can see that filters 1 and 2 are mostly excitatory, filter 3 is split between the excitatory and inhibitory, filters 4 and 5 are mostly inhibitory, and filter 6 is completely inhibitory. Therefore, each filter is as follows:

[4 Points] For each of the six chosen single channels of the filter, use image gradient techniques to visualize what each of these filters is most excited by (that is, what image maximally excites each of these filters?). This is a similar analysis to the first step, but now isolating the filters directly before your chosen filter.

In this section, we visualize our top 6 filter against the patterns of the previous layer. This is to show and compare what each filter is trying to identify in terms of features. Analysis to follow after code.

The input image from Filter 0 of the previous layer shows a "beaded" horizontal pattern distinguished by color (green and purple). Our filter heatmap show excitatory behavior for horizontal block patterns, thus providing evidence to support that filter 0 is identifying horizontal patterns.

In Filter 1, the input image from the previous layer shows a "scale" pattern where lines are typically going downward diagonally from top right to bottom left. Our filter heatmap shows excitatory behavior for backward L shape patterns, or as we interpret it: top right to bottom left diagonals. The filter heatmap and associated input pattern provide evidence to support that the filter is looking for top right to bottom left diagonal patterns.

Filter 2 shows us an input image with near horizontal but slightly downward patterns from top left to bottom right. Our filter heatmap shows a similar excitatory pattern within the bottom half, thereby providing evidence that filter 2 is looking for near horizontal but slightly downward left to right patterns.

Originally, our team was stumped on Filter 3 because initially we thought that the defining features were horizontal lines. However, looking at our filter heatmap and looking at the input pattern, we noticed that there are very subtle horizontal lines. We call them tiny bands. Our heatmap provides a pattern that supports this hypothesis and we believe there is evidence to suggest that filter 3 is identifying tiny banded horizontal patterns.

The input image for Filter 4 of the previous layer shows downward diagonal lines from top left to bottom right. However, the heatmap pattern does not provide evidence to support that this is what the filter is most excited about. Looking at our heatmap pattern, we believe there is some evidence to support that the filter is most excited about tiny dots that are scattered throughout the image. This is our best analysis we could conclude about this input image and filter.

For Filter 5, we believe that the input image is supposed to represent a scaled pattern. The associated heatmap is all inhibitory. We believe this suggests that when the model detects scaled patterns as shown in the input image, this filter inhibits the patterns from being recognized further down the model. However, we concluded that this filter was tough to accurately interpret.

Based on the analysis of the above 6 filters, there is some evidence to suggest that the layer we have based our multi-channel filter on (conv3_block3) is polysemantic - that is, there are unrelated features that are trying to be identified. This is evident with filters 0, 1, 2, and 3. Each filter is identifying different features (horizontal, diagonals left to right up, diagonals left to right down, bands). If we expected a single-semantic layer, we would have expected more input images and associated filters to be identifying more left to right upwards diagonals. We would see more pure features.

There is also some evidence to suggest that these filters are curve detectors as shown in the above analysis. Observing our heatmaps (feature augmentation), we believe that some filters are identifying curves. We did not see evidence to support or suggest that our filter is used to identifying high-low frequency features or pose-invariant features.

Final Analysis

Recall the original hypothesis: "this filter is trained to activate when it finds diagonal lines going upward from left to right". Does our analysis support or refute our hypothesis? Maybe. We believe there is no strong evidence to support our hypothesis, but we believe there is not enough evidence to refute our hypothesis. We also believe that there is evidence to support our hypothesis, but also to suggest that there is more than one pattern this filter is trying to identify. Breaking down our evidence for and against.

For:

Against:

Expanding:

In conclusion, we believe our hypothesis is not currently supported by the evidence, but also it is not refuted and that maybe it needs to be expanded upon. Additionally, we believe that more than 6 input filters are needed for these observations. We think 6 is too small for a thorough analysis because our layer has 256 filters as opposed to other layers that may have 64 filters. If we had done an analysis of 30 filters, we could have concluded whether our hypothesis was sound or not.